Overprvning Large Decision Trees

نویسنده

Jason Catlett

چکیده

This paper presents empirical evidence for five hypotheses about learning from large noisy domains: that trees built from very large training sets are larger and more accurate than trees built from even large subsets; that this increased accuracy is only in part due to the extra size of the trees; and that the extra training instances allow both better choices of attribute while building the tree, and better choices of the subtrees to prune after it has been built. For the practitioner with the common goals of maximising the accuracy and minimising the size of induced trees, these conclusions prompt new techniques for induction on large training sets. Although building huge trees from huge training sets is computationally expensive, pruning smaller trees on them is not, yet it improves accuracy. Where a pruned tree is considered too large for human or machine limitations, it can be overpruncd to an acceptable size. Although this requires far more time than building a tree of that size from a correspondingly small training set, it wi l l usually be more accurate. The paper also describes an algorithm for overpruning trees to user-specified size limits; it is evaluated in the course of testing the above hypotheses.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

Background: Response variables in most medical and health-related research have an ordinal nature. Conventional modeling methods assume predictor variables to be independent, and consider a large number of samples (n) compared to the number of covariates (p). Therefore, it is not possible to use conventional models for high dimensional genetic data in which p > n. The present study compared th...

متن کامل

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Decision-tree algorithms provide one of the most popular methodologies for symbolic knowledge acquisition. The resulting knowledge, a symbolic decision tree along with a simple inference mechanism, has been praised for comprehensibility. The most comprehensible decision trees have been designed for perfect symbolic data. Classical crisp decision trees (DT) are widely applied to classification t...

متن کامل

Predicting The Type of Malaria Using Classification and Regression Decision Trees

Predicting The Type of Malaria Using Classification and Regression Decision Trees Maryam Ashoori1 *, Fatemeh Hamzavi2 1School of Technical and Engineering, Higher Educational Complex of Saravan, Saravan, Iran 2School of Agriculture, Higher Educational Complex of Saravan, Saravan, Iran Abstract Background: Malaria is an infectious disease infecting 200 - 300 million people annually. Environme...

متن کامل

A visualization tool for interactive learning of large decision trees

Decision tree induction is certainly among the most applicable learning techniques due to its power and simplicity. Howevel; learning decision trees from large datasets, particularly in data mining, is quite different from learning from small or moderately sized datasets. When learning from large datasets, decision tree induction programs often produce very large trees. How to visualize efficie...

متن کامل

Compiling large-context phonetic decision trees into finite-state transducers

Recent work has shown that the use of finite-state transducers (FST’s) has many advantages in large vocabulary speech recognition. Most past work has focused on the use of triphone phonetic decision trees. However, numerous applications use decision trees that condition on wider contexts; for example, many systems at IBM use 11-phone phonetic decision trees. Alas, large-context phonetic decisio...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1991

Overprvning Large Decision Trees

نویسنده

چکیده

منابع مشابه

Comparison of Ordinal Response Modeling Methods like Decision Trees, Ordinal Forest and L1 Penalized Continuation Ratio Regression in High Dimensional Data

A New Algorithm for Optimization of Fuzzy Decision Tree in Data Mining

Predicting The Type of Malaria Using Classification and Regression Decision Trees

A visualization tool for interactive learning of large decision trees

Compiling large-context phonetic decision trees into finite-state transducers

عنوان ژورنال:

اشتراک گذاری